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ABSTRACT 

The literature regarding the use of multiple comparisons in 
analysis of variance is reviewed. Two reasons v/hy planned 
comparisons are generally superior to the use of unplanned or 
post hoc tests are presented. It is suggested that orthogonal 
tests are generally more useful than non-orthogonal tests. It is 
argued that planned comparisons can be used even when omnibus 
tests are not statistically significant, or in place of such 
tests. Use of planned comparisons tends to result in more 
thoughtful research with greater power against Type II error. 



Empirical studies of research practice (Edgington^ 1974; 
Goodw'n & Goodwin, 1985; Wlllson, 1980) indicate that the 

analysis or variance (ANOVA) raethods presented by Fisher (1925J 
several generations ago remain popular with social scientists, 
notwithstanding withering criticisms of some of these 
applications (Cohen, 1968; Thompson, 1986). Most users of ANOVA- 
type methods (anova, ancova, manova, MANCOVA--herea£ter labelled 
OVA methods) are aware that "A researcher cannot stop his 
analysis after getting a significant F; he must locate the cause 
of the significant F" (Huck, Cormier & Bounds, 1974). Gravetter 
and Wallnau (1985, p. 423) concur that "Reject Ho indicates that 
at least one difference exists among the treatments. With 
[means] = 3 or more, the problem is to find where the differences 
are." Moore (1983, p. 299) suggests that: 

If we have statistical significance when we have 
only two groups, and thus only two means, we can 
visually Inspect the dax:a to determine which group 
performed better than the other. But when we have 
threrB or more groups, we need to investigate 
specixic mean comparisons. 
Many researchers employ unplanned (also called a posteriori 
or post hoc) multiple comparison rests (e.g., Scheffe, Tukey, or 
Duncan) to isolate means that are significantly different within 
CVA ways (also called factors) having more than two levels. As 
Glass and Hopkins (1984, p. 368) note, 

MC procedures are a relatively recent addition to 
the statistical arsenal; most MC techniques were 
developed during the 1950*s, although their use in 



behavioral research was rare prior to the 1960*s. 
Textbook authors tend to discuss unplanned comparison or contrast 
procedures in a somewhat prejorative terms. For example. Kirk 
(1984, p. 360) speaks of the use of unplanned comparisons as 
"ferreting out significant differen:es among means, or, as it is 
often called, data snooping." The following quotations are 
additional representatives of this genre of views: 

Techniques that have been developed for data 
snooping following an over-all [significant 
omnibus] F test... are referred to as a posteriori 
or post hoc tests. (Kirk, 1968, p. 73) 

The post hoc method is suited for trying out 
hunches gained during the data analysis. (Hays, 

1981, p. 439) 

Post hoc comparisons, on the other hand, enable 
the researcher to engage in so-called data 
snooping by performing any or all of the 
conceivable comparisons between means. (Pedhazur, 

1982, p. 305) 

Prior to running tre experiment, the investigator 
In our example had no well-developed rationale for 
focusing on a particular comparison between means* 
His was a "fishing expedition*'... Such comparisons 
are known as post hoc comparisons, because 
interest in them Is developed "after the fact"--it 
is stimulated by the results obtained, not by any 



prior rationale. (Minium & Clarke, 1982, p. 321) 



Post hoc comparisons often take the form of an 
intensive "r.ulking" of a set of results — e.g., the 
comparison of all possible pairs of treatment 
means. (Keppel, 1982, p. 150) 

Post hoc comparisons are made in accordance with 
the serendipity principle — that is, after 
conducting your experiment you may find something 
interesting that you were not initloliy looking 
for. (McGuigan, 1983, p. 151) 

Planned (also called a priori) comparisons provide an 
alternative to the OVA user who is interested in isolating 
differences among means. As Keppel (1982, p. 164) notes in his 
excellent treatment^ decisions about which unplanned or planned 
comparisons to employ in OVA research are complex and not always 
well understood by researchers: 

The fact that there is little agreement among 
commentators writing in statistical book?: and 
articles concerning specific courses of action to 
be followed with multiple comparisons simply means 
that the issues are complex, and that no single 
solution can be offered to meet adequately the 
varied needs of researchers. Consequently, you 
should view the situation... with a realization 
that you must work the problem out for yourself. 
The purpose of the present paper is to acquaint the reader with 



some of these complex issues, and to argue that planned 
comparisons should be employed more frequently in OVA research. 

Rationale Underlying Unplanned Com'^ar isons 

Most contemporary researchers recognize that 

t.-tests performed on all possible pairs of means 
involved in the F-test... [to] reveal where 
significant differences between means lie... is 
quite unacceptable methodology. The t,-test was not 
designed for this use and is invalid when so 
applied... In spite of the patent invalidity of t- 
testing following a significant F-ratio in the 
analysis of variance, or multiple t.-testing in 
lieu of the analysis of variance^r this method has 
often been and continues to be used. (Glass & 
Stanley, 1970, p. 382) 
However, not all researchers understand the basis for these 
conclusions. The rationale involves the control of exper imentwlse 
Type I error rate, and thus requires an understanding of the 
nature of exper imentwise error rate. 

When a researcher conducts a study in which only one 
hypothesis is tested, the Type I error probability is the nominal 
dlpha level selected by the researcher, i.e., often the 0.05 
level of statistical significance. The probability of making a 
Type I error when testing a given hypothesis is called the 
testwise error rate. Exper imentwise error rate refers to the 
cumulative probability that a Type I error was made somewhere in 
the full set of hypothesis tests conducted in the study overall. 
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In the case of a study in which only one hypothesis is tested, 
the testwlse error rate exactly equals tne exper imentwlse error 
rate . 

However, when several hypotheses (e.g., two ir^ain effect and 
one interaction effect) are tested within a single study, the 
exper imentwise error rate may not equal the nominal testwise 
alpha level used to test each o£ the separate hypotheses, witte 
(1985, p. 236) provides an analogy that may clarify why this is 
so : 

When a fair coin is tossed only once, the 
probability of heads equals 0,50 — just as when a 
single t test is to be conducted at the 0,05 level 
of significance, the probability of a type I error 
equals 0.05. When a fair coin is tossed three 
times, however, heads can appear not only on the 
first toss but also on the second or third toss, 
and hence the probability of heads on at_ least one 
of the three tosses exceeds 0.50. By the same 
token, when a type I error can be committed not 
only on the first test but also on the second or 
third test, and hence the probability of 
committing a type I error on aj^ least one of the 
three tests exceeds 0.05. In fact, the cumulative 
probability of at least one type I error can be as 
large as 0.15 for this series of three t^ tests . 
In fact, as Thompson (in press) explains, the exper imentwise 
error ate would range somewhere between the nonimal testwise 
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alpha level and (1 - (1 - testwise alpha) raised to the power of 
the number of hypotheses tested. For example, if nine hypotheses 
were each tested at the 0.05 level in a <=?lngle study, the 
exper Imentwlse error rate would range between 0.05 and 0.37. 

Exper Imentwlse error rate is at a maximum when the 
hypotheses tesced within an experiment are orthogonal or 
uncorrelated . For example, the tests of all omnibus hypotheses in 
a factorial multi-way ANOVA with equal numbers of subjects in 
each cell are all uncorrelated. This is why the sums of squares 
(SOS) for each effect plus the error SOS add up to exactly equal 
the SOS total. Thus, in a 3x4 ANOVA in which both main effect a.^iu 
the one two- Interact ion omnibus hypotheses are tested at the 0.05 
level, the exper imentwlse error rate would be about 0.14. 

Unplanned comparisons incorporate a correct ion ( Games, 
1971a, 1971b) that minimizes the inflation of exper imentwlse 
error rate as a function of conducting more hypothesis tests in a 
single study, especially given that omnibus hypotheses have 
already been tested. As Horvath (1985, p. 223 ) notes,- "Performing 
a multitude of comparisons between the treatments raises the 
spectre of an Increased overall probability of a Type I error. 
Post F-test procedures must Include some accomodation for this 
danger." As Kirk (1984, t • 360 ) explains. 

The principal advantage of this multiple 
comparison procedure over Student's t is that the 
probability of erroneously /ejecting one or more 
null hypotheses doesn't increase as a function of 
the number of hypotheses tested. Regardless of the 
number of tests performed among £ means, this 



probabi 1 Ity remains equa] to or less than alpha 
for the collection of tests. 

Snodgrass, Levy-Berger and Haydon (1985, p, 386) note that: 
Tne post hoc tests for such multiple comparisons 
all adjust, to one degree or another, for the 
increase in the probability of a Type I error as 
the number o£ Ge:)mp5risons in increased. They 
differ in the degree to which the probability of a 
Type I error is reduced. 

The authors discuss whicW tests are more conservative in this 

adjustment and which are more liberal. 

Planned Compar ison Procedures 

Planned comparisons are t:he alternative to unplanned 
comparisons for researchers who wish to isolate differences 
between sets of specific means. Pedha2:ur ( 1982, chapter 9) a * 
Loftus and Loftus (1982, chapter 15) provide valuable 
explanations of these methods. Various types of planned 
comparisons can be used, including both orthogonal and non- 
orthogonal planned comparisons. Planned comparisons typically 
involve weighting data by sets of "contrasts" such as those 
presented by Thompson (1985a) or the contrasts presented in Table 
1. Other types of contrasts, those which test for trends in 
means, are provided by Fisher and Yates (1057, pp. 90-100) and by 
Hicks (1973) for various research designs. 

INSERT TABLE 1 ABOUT HEReT 

Contrasts are typically developed to sum to zero, as do all 
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five contrasts presented in Table 1, Contrasts are uncorrelated 
or orthonogonal (and the hypotheses they represent likewise) when 
the contrasts each sum to zero and when the cross-products of 
each pair of contrasts all sum to zero also. Thus^ the contrasts 
presented in Table 1 are all uncorrelated. Planned contrasts aie 
employed in a regression analysis in the manner illustrated by 
Thompson (1985a) and as explained by Pedhazur (1982), The 
required computer cards for this case are presented in Appendix 
A, 

The num^'?r of orthogonal planned comparisons always equals 
the number of degrees of freedom for a given effect. As Hays 
(1981, p- 425) notes. 

Each and every degree of freedom associated with 
treatments in any fixed-effects analysis of 
variance corresponds to some possible comparison 
of means. The number of degrees of freedom for the 
mean square between is the number of possible 
independen t [i.e., orthogonal] comparisons to be 
made on the means. 
Some researchers do not believe that planned comparisons 
should necessarily be orthogonal. For example, Winer (1971, p, 
175) argues that, "In practice the comparisons that are 
contructed are those having some meaning in terms of the 
experimental variables; whether these comparisons are orthogonal 
or not makes little or no difference," 

However, most researchers believe that orthogonal planned 
comparisons have special appeal. Kachigan (1986, p, 309) notes 
that: 
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The importance that we place on a set of 
orthogonal comparisons is that both of these 
[individual test and exper imentwise ] significance 
levels is known to us... On the other hand, when 
we deal with sets of unplanned non-orthogonal 
comparisons, these probabilities are not generally 
available to us, because of the unplanne<3 nature 
of the comparisons, and because of the non- 
independence among them. 
Keppel (1982, p. 147) suggests that: 

The value of orthogonal comparisons lies in the 
independence of inferences , which, of course, is a 
desirable quality to achieve. That is, orthogonal 
comparisons are such that any decision concerning 
the null hypothesis representing one comparison is 
uninfluenced by the decision concerning the null 
hypothesis representing any other orthogonal 
comparison. The potential difficulty with 
nonorthogonal comparisons, then, is iiiterpreti ng 
the different outcomes. If we reject the null 
hypotheses for two nonorthogonal compar isons, 
which comparison represents the "true" reason for 
the observed differences? 

Two Reasons Why Plannea Comparisons are Superior 

There are two reasons why researchers generally prefer the 
use of planned comparisons to the use of unplanned comparisons. 
First, as noted by numerous researchers, planned comparisons 
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of f er more power against making Type , I I errors : 

procedures recommended for a, priori orthogonal 
comp>irisons are more powerful than procedures 
recommended ff/ ^ p riori nonorthogonal and a 
p osteriori comparisons. That is, the former 
procedures are more likely to detect real 
differences among means. (Kirk, 1968, p. 95) 

The probability of test's detecting that..- [the 
contrast's effect] Js not zero [i.e., is 
statistically significant] is greater with a 
planned than with an unplanned comparison on the 
same sample means. Thus, for any particular 
comparison, the test is more powerful when planned 
than when post hoc. (Hays, 1981, p- 438) 

Post hoc tests protect us from making too many 
Type I errors by requiring a bigger difference 
before declaring it to be significant than do 
planned comparisons. But this protection tends to 
be too conservative for planned compar isons, 
thereby lowering the power of the test. (Minium & 
Clarke, 1982, p. 322) 

The tests of significance for a prjori, or 
planned, comparisons are more powerful than those 
for post hoc comparisons. In other words, it is 
possible for a specific comparison to be not 
significant when tested by post hoc methods but 
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significant when tested by a priori methods. 
(Pedhasur, 1982, pp, 304-305) 

Post hoc comparisons must always follow the 
finding of a significant overall F-value... There 
are no 1 Imits to the number of combinations that 
can be tested post hoc, but none of these 
procedures has the power of planned comparison 
tests for dete^ c * ng statistical s ignif Icance • 
(Sowell & Casey, 1982, p. 119) 

The test of planned subhypctheses is more powerful 
than the test of post hoc subhypotheses . For this 
reason, we should make planned comparisons 
whenever possible in planning the design of 
research within the ANOVA context. (Glasnapp & 
Poggio, 1985, p. 474) 

Second, and perhaps even more Importantly, planned 
comparisons tend to force the researcher to be more thoughtful in 
conducting research, since the number of planned comparisons that 
can be tested Is limited by the number of degrees of freedom for 
an effect, as noted previously. As Snodgrass, Levy-Berger and 
Haydon (1985, p. 386) suggest, "The experimenter who carries out 
post hoc comparisons often has a rather diffuse hypothesis about 
what the effects of the manipulation should be." Keppel (1982, p. 
165) notes that, "Planned comparisons are usually the motivating 
force behind an experiment. These comparisons are targeted from 
the start of the investigation and represent an Interest in 
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particular combinations of cond i t ions--not in the overall 
experiment." In summary, as Kerlinger (1986, p. 219) suggests, 
"While post hoc tests are important in actual research, 
especially for exploring one's data and for getting leads for 
future research, the method of planned comparisons is perhaps 
more important scientifically." 

Use of Planned Comparisons in Lieu of Omnibus Tests 

Some researchers suggest that at least some unplanned 

comparisons can be made even if an omnibus effect is not 

statistically significant. For example, Spenc2, Cotton, Underwood 

and Duncan (1983, p, 215) suggest that. 

The Tukey hsd [honestly significant difference 
test] usually is performed only if the F obtained 
in the analysis of variance is significant, but it 
theoretically permissible to perform whatever the 
significance of F. 

Similarly, Hays (1981, p. 434) notes: 

This statement is not to be interpreted to mean 
that post hoc comparisons are somehow illegal or 
immoral if the original F test is not significant 
at the required alpha level... What one cannot do 
is to attach an unequivocal probability statement 
to such post hoc comparisons, unless the 
conditions underlying the method have been met. 

However, the preponderant view regarding use of unplanned post 

hoc tests is expressed by Gravetter and Wallnau (1985, p. 423): 
These [a posteriori] tests attempt to control the 
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overall alpha level by making the adjustments for 
the number o£ different samples (potential 
comparisons) in the experiment. To justify a 
posteriori tests, the F-ratio from the overall 
ANOVA must be significant. 
On the other hand, with respect to the use of planned 
comparisons, "Most statisticians agree that planned t tests 
between means are appropriate, even when the overall F Is 
insignificant" (Clayton, 1984, p, 193). Snodgrass, Levy-Berger 
and Haydon (1985, p. 386) concur: 

For planned comparisons, it is not necessary for 
the overall ANOVA to be significant in order to 
carry theTa out . • • Post hoc comparisons, on the 
other hand, may not be carried out unless the 
overall ANOVA Is significant. 
Gravetter and Wallnau (1985, p. 423) agree that, "Planned 
comparslons can be made even when the overall F-ratlo Is not 
s igni f leant . " 

In fact, "It is not necessary to perform an over-all test of 
significance prior to carrying out planned orthogonal t^ tests" 
(Kirk, 1968, p. 73). As Hays (1981, p. 426) suggests. 

The F test gives evidence to let us judge if all 
of a set of J - 1 such orthogonal comparisons are 
simultaneously zero In the populations. For this 
reason, if planned orthogonal comparisons are 
tested separately, the overall Ftest is not 
carried out, and vice versa. 
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Swamlnathan (In press) presents the same arguement with respect 

to the MANOVA case: 

The often advocated procedure of following up the 
rejection of the null hypothesis with a more 
powerful multiple comparison procedure should be 
discouraged. First, the overall rejection of the 
null hypothesis does not guarantee any meaningful 
contrast among the means will be significant, as 
our example showed. Second..., slgnf leant 
contrasts may be found even when the null 
hypothesis would not have been rejected. Third, 
follow up multiple comparison procedures which are 
unrelated to the overal 1 test result in an 
inflation of the experiment-wise error rate. If 
multiple comparisons are of primary Interest, a 
suitable multiple comparison procedure can be used 
without first performing an overall test. 

A Concrete Heur ist Ic Example 

Just as some researchers benefit from seeing heuristic 
demonstrations that all parametric significance testing 
procedures are subsumed by and can be conducted with canonical 
correlation analysis (Thompson, 1985b), it may be helpful to 
present a hypothetical analysis demonstrating the utility of 
planned orthogonal comparisons using the data presented in Table 
1 . Table 2 presents a conventional one-way ANOVA keyout 
associated with the Table 1 data. Even if the researcher 
conducted unplanned post hoc tests in the absence of a 
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statistically significant main effect, none of the unplanned 
t3sts would result in a statistically significant comparison. 
However, as noted in Table 3, a statistically significant (£ < 
0.01) result is isolated for the hypothesis that the mean 
att itude-toward-school score of the two school board members 
differs from the mean for the remaining 10 subjects. 

INSERT TABLES 2 AND 3 ABOUT HERE. 



Summary 

The literature regarding the use of multiple comparisons in 
analysis of variance is reviewed. Two reasons why planned 
comparisons are generally superior to the use of unplanned or 
post hoc tests are presented. It is suggested that orthogonal 
tests are generally more useful than non-orthogonal tests. It is 
argued that planned comparisons can be used even when omnibus 
tests are not statistically significant, or in place of such 
tests. Use of planned comparisons tends to result in more 
thoughtful research with greater power against Type II error. 
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Table 1 

Hypothetical Data for Attitudes Toward Schoo] Study (n=12) 



Contrast 



Group LEVEL 


I D 


DV 


CI 


C2 


C3 


C4 


Cb 




1 


1 


lU 


u 




0 


0 


-1 






L 


20 


0 


0 


0 


0 


-1 


ieacner A1Q6S 




3 


10 


0 


0 


0 


-1 


-1 






4 


20 


0 


0 


0 


-1 


-1 


Teachers 


3 


5 


10 


0 


0 


-1 


-1 


-1 






6 


20 


0 


0 


-1 


-1 


-1 


Principals 


4 


7 


10 


0 


-1 


-1 


-1 


-1 






8 


20 


0 


-1 


-1 


-1 


-1 


Superintendents 


5 


9 


10 


-1 


-1 


-1 


-1 


-1 






10 


20 


-1 


_ 1 

JL 


-1 


-1 


-1 


Board Merabers 


6 


11 


25 


1 


2 


3 


4 


5 






12 


35 


1 


2 


3 


4 





Table 2 
One-way ANOVA Results 



Source 
Between 
Error 
Total 



SOS df 

375.0000 5 

300.0000 6 

675.0000 11 



Mean 
Square 
75.0000 
50 .0000 



Eta 

F p Square 

1.5000 .3155 .55556 



Table 3 
Planned Comparison Results 



Contrast 








Mean 






Source 


SOS 




df 


Square 


F 




CI 




.0000 


1 


.0000 


0. 


0000 


02 




.OjOO 


1 


.0000 


0. 


0000 


03 




.0000 


1 


.0000 


0. 


0000 


04 




.0000 


1 


.0000 


0. 


0000 


C5 


375 


.0000 


1 


375.0000 


12. 


5000 


Error 


300 


.0000 


6 


50.0000 






Total 


675 


.0000 


11 









.0054 



Eta 

Square 
. 00000 
. 00000 
.00000 
.00000 
. 55556 
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APPENDIX A 
Selected SPSS-X Control Cards 



TITLE '*****OMNIBUS no POSTHOC no A PRIORI yes' 
FILE HAtiuLE BT, NAME= ' APRIORI . DTA' 
DATA LIST FILE=BT/LEV 1 DV 2-4 
JOMPUTE C1=0 
COMPUTE C2=0 
COMPUTE C3=0 
COMPUTE C4=0 
COMPUTE C5=0 
IF (LEV EO 6)Cl=l 
IF (LEV EQ 5)C1=-1 
IF (LEV EQ 6)C2=2 
IF (LEV EQ 4 OR LEV 
IF (LEV EQ 6)C3=3 
IF (LEV GT 2 AND C3 
IF (WAY EQ 6)C4=4 
IF (WAY GT 1 AND C4 
IF (WAY EQ 6)C5=5 
IF (C5 EQ 0)C5=-1 
REGRESSION VARIABLES=DV CI TO C5/DESCRIPTI VES=ALL/ 

CRITERIA=PIN( .95) POUT(.999) TOLERANCE (. 00001 ) /DEPENDENT=DV/ 
ENTER C5/ENTER C4/ENTER C3/ENTER C2/ENTER Cl/ 



EQ 5)C2=-1 
EQ 0)C3=-1 
EQ 0)C4=-1 




